Skip to content

[PWGJE] Add SlimTables data model and producer for future standalone embedding studies#15132

Draft
louisemillot wants to merge 8 commits intoAliceO2Group:masterfrom
louisemillot:SlimTablesEmbedding
Draft

[PWGJE] Add SlimTables data model and producer for future standalone embedding studies#15132
louisemillot wants to merge 8 commits intoAliceO2Group:masterfrom
louisemillot:SlimTablesEmbedding

Conversation

@louisemillot
Copy link

This PR introduces a new slimmed data model together with a dedicated producer workflow in PWGJE. The objective is to provide a reduced set of AOD tables containing only the minimal information needed for standalone embedding studies and derived AO2D production
These derived datasets are intended to be used as reduced inputs for further jet substructure analyses, including studies based on FastJet algorithms. The goal is to reduce dataset size while keeping the essential information required for substructure reconstruction and analysis workflows.

@github-actions github-actions bot changed the title Add SlimTables data model and producer for future standalone embedding studies [PWGJE] Add SlimTables data model and producer for future standalone embedding studies Feb 23, 2026
@nzardosh
Copy link
Collaborator

Hi Louise, just to understand why do you use this instead of just using the JTracks table (which is also smaller)?
In any case you are of course welcome to make your own tables for studies but please move the table definitions into your task, rather than into the datamodel folder

@louisemillot
Copy link
Author

Hi Nima, thanks for the clarification. I initially thought the table definitions should go into the datamodel folder, but I’ve now moved them directly into the task as suggested.
Please let me know if this looks fine!

Copy link
Collaborator

@nzardosh nzardosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some more comments here. I would like to reiterate that to me trying to do embedding locally on a large scale is quite difficult and will need alot of work + alot of dedicated computing resources, so in my opinion it is not advisable. Of course it is up to you if you wish to give it a try.

Filter particleCuts = (aod::jmcparticle::pt >= minPt && aod::jmcparticle::pt < maxPt && aod::jmcparticle::eta > minEta && aod::jmcparticle::eta < maxEta);

void processData(soa::Filtered<o2::aod::JetCollisions>::iterator const& collision,
soa::Filtered<soa::Join<aod::JetTracks, aod::JTrackExtras, aod::JTrackPIs>> const& tracks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using the JTrackExtras table here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hasnt been resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the JTrackExtras table here because it contains the dcaZ information, without it, selectTrackDcaZ cannot access dcaZ, which is required for track selection

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there not already a dcaZ cut applied for globalTracks? Or do you need a different value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the getGlobalTrackSelection function, and indeed it already applies a dcaZ cut and a minimum number of crossed TPC rows. Since I’m also requesting these same cuts explicitly in my code, it’s technically not necessary to use JTrackExtras for these selections, except if I want to study systematics with a different dcaZ or track quality cut


slimCollisions(collision.posZ());
auto slimCollIndex = slimCollisions.lastIndex();
for (const auto& track : tracks) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you not intending to have any track selection at all? this way you are going to take all tpc only tracks as well and many other tracks that are not good

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m in the process of implementing a proper track selection

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you not wish to use globalTracks?

slimtracks::Px,
slimtracks::Py,
slimtracks::Pz,
slimtracks::E);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have pT, eta and phi then you dont need Px, Py, Pz and Energy stored explicitly like this. You can have them as dynamic coloumns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, I removed (pt, eta, phi) and kept (px, py, pz, E), since the standalone workflow relies on FastJet, which directly uses Cartesian 4-vectors as input

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need E at detector level if you always assume the pion mass?

slimparticles::Phi,
slimparticles::Px,
slimparticles::Py,
slimparticles::Pz);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you dont assume the pion mass here why dont you save the actual energy? otherwise you cant make a proper 4 vector

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m now saving the actual energy of the particle as well - I just forgot it

PROCESS_SWITCH(SlimTablesProducer, processData, "process collisions and tracks for Data and MCD", false);

void processMCD(soa::Filtered<aod::JetCollisionsMCD>::iterator const& collision,
soa::Join<aod::JetMcCollisions, aod::JMcCollisionPIs> const&, // join the weight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is JMcCollisionPIs used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still there

auto slimCollIndex = slimCollisions.lastIndex();
slimCollisions(collision.posZ());
for (const auto& track : tracks) {
float mass = jetderiveddatautilities::mPion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the above comments in the data part apply here too

float mass = jetderiveddatautilities::mPion;
float p = track.pt() * std::cosh(track.eta());
float energy = std::sqrt(p * p + mass * mass);
slimTracks(slimCollIndex, track.pt(), track.eta(), track.phi(), track.px(), track.py(), track.pz(), energy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in MC you now have a mcd table for collisions and a mcp table but there is no way to link them. How will you perform your matching between mcp and mcd?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont see any matching

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no MCP–MCD matching implemented in this task.
This is intentional. The goal of this workflow is only to produce reduced (slim) datasets, not to perform the embedding or matching within O2Physics.

the matching is done later in a standalone workflow:

-I build hybrid jets by combining pp MC tracks with Pb–Pb data tracks
-Then I perform the matching between these hybrid jets and the MC truth (pp) jets/particles

So the matching is done at the jet level (hybrid ↔ MC truth), not at the MCP–MCD table level inside O2
for this reason, no explicit MCP–MCD linking is implemented here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how will you know in your framework later which particle level collision and which detector levle collision go together? I dont understand how you will match your mcd and mcp jets without this information

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll implement the geometric matching myself, similar to how it’s done in JE, to associate the MCP and MCD jets correctly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have misunderstood me. Imagine you have 1000 detector level events where you have saved collisions and 1000 particle level events where you have saved mcCollisions. Then you run mcd jets in the detector level ones and mcp jets in the particle level ones. How do you know for a given detector level jet, which particle level event's jets you need to look at for the matching?


void processMCD(soa::Filtered<aod::JetCollisionsMCD>::iterator const& collision,
soa::Join<aod::JetMcCollisions, aod::JMcCollisionPIs> const&, // join the weight
soa::Filtered<soa::Join<aod::JetTracks, aod::JTrackExtras, aod::JTrackPIs>> const& tracks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is JTrackPIs used?

@vkucera
Copy link
Collaborator

vkucera commented Mar 5, 2026

@nzardosh If you request changes before merging, please convert to draft until comments are addressed.

@vkucera vkucera marked this pull request as draft March 5, 2026 20:15
return;
}
histos.fill(HIST("h_mcCollMCD_counts_weight"), 1.5, eventWeight);
auto slimCollIndex = slimCollisions.lastIndex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the ordering is wrong here. You assign slimCollIndex here to lastIndex, but then only fill the slimCollisions table in the next line. This means slimCollIndex will have the index of the previous collision, not the current one!. Same comment for processMCD

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you’re right — I’ve been doing it wrong, I fixed it, thanks

slimCollisions(collision.posZ());
auto slimCollIndex = slimCollisions.lastIndex();
for (const auto& track : tracks) {
if (!jetderiveddatautilities::selectTrack(track, trackSelection) && jetderiveddatautilities::selectTrackDcaZ(track, trackDcaZmax)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you use a dcaZ selection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the DCAz cut to suppress secondaries and improve track quality for the jet reconstruction
Is there a specific reason why this would not be appropriate here? Otherwise I can remove it if you prefer to keep the selection minimal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globalTracks I believe already includes a dcaZ cut

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed

PROCESS_SWITCH(SlimTablesProducer, processData, "process collisions and tracks for Data and MCD", false);

void processMCD(soa::Filtered<aod::JetCollisionsMCD>::iterator const& collision,
soa::Join<aod::JetMcCollisions, aod::JMcCollisionPIs> const&, // join the weight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still there

float mass = jetderiveddatautilities::mPion;
float p = track.pt() * std::cosh(track.eta());
float energy = std::sqrt(p * p + mass * mass);
slimTracks(slimCollIndex, track.pt(), track.eta(), track.phi(), track.px(), track.py(), track.pz(), energy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont see any matching

Configurable<float> minEta{"minEta", -0.9, "min eta to save"};
Configurable<float> maxEta{"maxEta", 0.9, "max eta to save"};
Configurable<float> vertexZCut{"vertexZCut", 10.0f, "Accepted z-vertex range"};
Configurable<float> trackDcaZmax{"trackDcaZmax", 99, "additional cut on dcaZ to PV for tracks; uniformTracks in particular don't cut on this at all"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this configurable isnt needed anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants